Finding Connected Components on Map-reduce in Logarithmic Rounds

نویسندگان

  • Vibhor Rastogi
  • Ashwin Machanavajjhala
  • Laukik Chitnis
  • Anish Das Sarma
چکیده

Given a large graph G = (V,E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of mapreduce rounds and the communication of each round. Denoting d the diameter of the graph, and n the number of nodes in the largest component, all prior techniques for map-reduce either require a linear, Θ(d), number of rounds, or a quadratic, Θ(n|V |+ |E|), communication per round. We propose here two efficient map-reduce algorithms: (i) Hash-Greater-to-Min, which is a randomized algorithm based on PRAM techniques, requiring O(log n) rounds and O(|V |+ |E|) communication per round, and (ii) Hash-to-Min, which is a novel algorithm, provably finishing in O(log n) iterations for path graphs. The proof technique used for Hash-to-Min is novel, but not tight, and it is actually faster than Hash-Greater-toMin in practice. We conjecture that it requires 2 log d rounds and 3(|V | + |E|) communication per round, as demonstrated in our experiments. Using secondary sorting, a standard mapreduce feature, we scale Hash-to-Min to graphs with very large connected components. Our techniques for connected components can be applied to clustering as well. We propose a novel algorithm for agglomerative single linkage clustering in map-reduce. This is the first mapreduce algorithm for clustering in at most O(log n) rounds, where n is the size of the largest cluster. We show the effectiveness of all our algorithms through detailed experiments on large synthetic as well as real-world datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MST in O(1) Rounds of the Congested Clique

We present a distributed randomized algorithm finding Minimum Spanning Tree (MST) of a given graph in O(1) rounds, with high probability, in the congested clique model. The input graph in the congested clique model is a graph of n nodes, where each node initially knows only its incident edges. The communication graph is a clique with limited edge bandwidth: each two nodes (not necessarily neigh...

متن کامل

A Stabilizing Algorithm for Finding Biconnected Components

In this paper, a self-stabilizing algorithm is presented for finding biconnected components of a connected undirected graph on a distributed or network model of computation. The algorithm is resilient to transient faults, therefore, it does not require initialization. The proposed algorithm is based on stabilizing BFS construction and bridge-finding algorithms. Upon termination of these algorit...

متن کامل

An Efficient Parallel Strategy for Computing K-terminal Reliability and Finding Most Vital Edge in 2-trees and Partial 2-trees

We develop a parallel strategy to compute K-terminal reliability in 2-trees and partial 2-trees. We also solve the problem of finding the most vital edge with respect to Kterminal reliability in partial 2-trees. Our algorithms take O(logn) time withC(m;n) processors on a CRCW PRAM, where C(m;n) is the number of processors required to find connected components of a graph with m edges and n verti...

متن کامل

Distributed Approximation Algorithms in Unit-Disk Graphs

We will give distributed approximation schemes for the maximum matching problem and the minimum connected dominating set problem in unit-disk graphs. The algorithms are deterministic, run in a poly-logarithmic number of rounds in the message passing model and the approximation error can be made O(1/ log |G|) where |G| is the order of the graph and k is a positive integer.

متن کامل

Automatic Service Composition Based on Graph Coloring

Web services as independent software components are published on the Internet by service providers and services are then called by users’ request. However, in many cases, no service alone can be found in the service repository that could satisfy the applicant satisfaction. Service composition provides new components by using an interactive model to accelerate the programs. Prior to service comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1203.5387  شماره 

صفحات  -

تاریخ انتشار 2011